Goto

Collaborating Authors

 error bound


Error Bounds for Learning with Vector-Valued Random Features

Neural Information Processing Systems

This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting. The parameter complexity (number of random features) and sample complexity (number of labeled data) required to achieve such rates are comparable with Monte Carlo intuition and free from logarithmic factors.


Error Bounds of Imitating Policies and Environments

Neural Information Processing Systems

Imitation learning trains a policy by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understanding needs further studies. In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity. Noticed that by considering the environment transition model as a dual agent, imitation learning can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than behavioral cloning, suggesting a novel application of adversarial imitation for model-based reinforcement learning. We hope these results could inspire future advances in imitation learning and model-based reinforcement learning.


On Flow Matching KL Divergence

Su, Maojiang, Hu, Jerry Yao-Chieh, Pi, Sophia, Liu, Han

arXiv.org Machine Learning

We derive a deterministic, non-asymptotic upper bound on the Kullback-Leibler (KL) divergence of the flow-matching distribution approximation. In particular, if the $L_2$ flow-matching loss is bounded by $ε^2 > 0$, then the KL divergence between the true data distribution and the estimated distribution is bounded by $A_1 ε+ A_2 ε^2$. Here, the constants $A_1$ and $A_2$ depend only on the regularities of the data and velocity fields. Consequently, this bound implies statistical convergence rates of Flow Matching Transformers under the Total Variation (TV) distance. We show that, flow matching achieves nearly minimax-optimal efficiency in estimating smooth distributions. Our results make the statistical efficiency of flow matching comparable to that of diffusion models under the TV distance. Numerical studies on synthetic and learned velocities corroborate our theory.


The Price of Linear Time: Error Analysis of Structured Kernel Interpolation

Moreno, Alexander, Xiao, Justin, Mei, Jonathan

arXiv.org Machine Learning

Structured Kernel Interpolation (SKI) (Wilson et al. 2015) helps scale Gaussian Processes (GPs) by approximating the kernel matrix via interpolation at inducing points, achieving linear computational complexity. However, it lacks rigorous theoretical error analysis. This paper bridges the gap: we prove error bounds for the SKI Gram matrix and examine the error's effect on hyperparameter estimation and posterior inference. We further provide a practical guide to selecting the number of inducing points under convolutional cubic interpolation: they should grow as $n^{d/3}$ for error control. Crucially, we identify two dimensionality regimes governing the trade-off between SKI Gram matrix spectral norm error and computational complexity. For $d \leq 3$, any error tolerance can achieve linear time for sufficiently large sample size. For $d > 3$, the error must increase with sample size to maintain linear time. Our analysis provides key insights into SKI's scalability-accuracy trade-offs, establishing precise conditions for achieving linear-time GP inference with controlled approximation error.


Review for NeurIPS paper: Error Bounds of Imitating Policies and Environments

Neural Information Processing Systems

Weaknesses: The approach used to compare BC seems out of date as there are many more recent approaches that address the compounding error problem and have been shown to achieve better results in imitation learning, such as maxent IRL and other more recent methods in IRL. Reactive policy matching can lead to bad states which is why value iteration and Q learning are used to estimate state-action pairs in terms of future cumulative rewards. For example, Max Ent IRL aims uses value iteration to capture feature preferences of the experts to model their behavior in unseen settings with the explicit goal of maximizing the distribution of the policy through near optimal state-action pairs rather than matching the expert policy directly. The learned policies may differ from expert demonstrations as long as the state-action features being optimized by the learner are similar to those of the demonstrator. In other words, there are multiple near-optimal policies that would be acceptable and are still indicative of the demonstrated behavior which are learned by this method reducing the effect of compounding errors and allowing for deviations from the expert state distribution.


Classification Error Bound for Low Bayes Error Conditions in Machine Learning

Yang, Zijian, Eminyan, Vahe, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine Learning

In statistical classification and machine learning, classification error is an important performance measure, which is minimized by the Bayes decision rule. In practice, the unknown true distribution is usually replaced with a model distribution estimated from the training data in the Bayes decision rule. This substitution introduces a mismatch between the Bayes error and the model-based classification error. In this work, we apply classification error bounds to study the relationship between the error mismatch and the Kullback-Leibler divergence in machine learning. Motivated by recent observations of low model-based classification errors in many machine learning tasks, bounding the Bayes error to be lower, we propose a linear approximation of the classification error bound for low Bayes error conditions. Then, the bound for class priors are discussed. Moreover, we extend the classification error bound for sequences. Using automatic speech recognition as a representative example of machine learning applications, this work analytically discusses the correlations among different performance measures with extended bounds, including cross-entropy loss, language model perplexity, and word error rate.


Error Bounds for Learning with Vector-Valued Random Features

Neural Information Processing Systems

This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting.


Error Bounds of Imitating Policies and Environments

Neural Information Processing Systems

Imitation learning trains a policy by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understanding needs further studies. In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity. Noticed that by considering the environment transition model as a dual agent, imitation learning can also be used to learn the environment model.


Error Bounds For Gaussian Process Regression Under Bounded Support Noise With Applications To Safety Certification

Reed, Robert, Laurenti, Luca, Lahijanian, Morteza

arXiv.org Machine Learning

Gaussian Process Regression (GPR) is a powerful and elegant method for learning complex functions from noisy data with a wide range of applications, including in safety-critical domains. Such applications have two key features: (i) they require rigorous error quantification, and (ii) the noise is often bounded and non-Gaussian due to, e.g., physical constraints. While error bounds for applying GPR in the presence of non-Gaussian noise exist, they tend to be overly restrictive and conservative in practice. In this paper, we provide novel error bounds for GPR under bounded support noise. Specifically, by relying on concentration inequalities and assuming that the latent function has low complexity in the reproducing kernel Hilbert space (RKHS) corresponding to the GP kernel, we derive both probabilistic and deterministic bounds on the error of the GPR. We show that these errors are substantially tighter than existing state-of-the-art bounds and are particularly well-suited for GPR with neural network kernels, i.e., Deep Kernel Learning (DKL). Furthermore, motivated by applications in safety-critical domains, we illustrate how these bounds can be combined with stochastic barrier functions to successfully quantify the safety probability of an unknown dynamical system from finite data. We validate the efficacy of our approach through several benchmarks and comparisons against existing bounds. The results show that our bounds are consistently smaller, and that DKLs can produce error bounds tighter than sample noise, significantly improving the safety probability of control systems.


Error Bounds for Transductive Learning via Compression and Clustering

Neural Information Processing Systems

In contrast to inductive learning, in the transductive setting the learner is given both the training and test sets prior to learning. The goal of the learner is to infer (or "transduce") the labels of the test points. The transduction setting was introduced by Vapnik [1, 2] who proposed basic bounds and an algorithm for this setting. Clearly, inferring the labels of points in the test set can be done using an inductive scheme. However, as pointed out in [2], it makes little sense to solve an easier problem by'reducing' it to a much more difficult one. In particular, the prior knowledge carried by the (unlabeled) test points can be incorporated into an algorithm, potentially leading to superior performance. Indeed, a number of papers have demonstrated empirically that transduction can offer substantial advantage over induction whenever the training set is small or moderate (see e.g.